智能论文笔记

A Feedforward Unitary Equivariant Neural Network

Pui-Wai Ma , T. -H. Hubert Chan

分类：机器学习

2022-08-25

我们设计了一种新型的前馈神经网络。相对于统一组$ u（n）$，它是均等的。输入和输出可以是$ \ mathbb {c}^n $的向量，并具有任意尺寸$ n $。我们的实施中不需要卷积层。我们避免因傅立叶样转换中的高阶项截断而导致错误。可以使用简单的计算有效地完成每一层的实现。作为概念的证明，我们已经对原子运动动力学的预测给出了经验结果，以证明我们的方法的实用性。

translated by 谷歌翻译

Towards Personalized Healthcare in Cardiac Population: The Development of a Wearable ECG Monitoring System, an ECG Lossy Compression Schema, and a ResNet-Based AF Detector

Wei-Ying Yi , Peng-Fei Liu , Sheung-Lai Lo , Ya-Fen Chan , Yu Zhou , Yee Leung , Kam-Sang Woo , Alex Pui-Wai Lee , Jia-Min Chen , Kwong-Sak Leung

分类：人工智能

2022-07-11

心血管疾病（CVD）是全球死亡的第一大原因。尽管有越来越多的证据表明心房颤动（AF）与各种CVD有着密切的关联，但这种心律不齐通常是使用心电图（ECG）诊断的，这是一种无风险，无侵入性和具有成本效益的工具。在任何威胁生命的疾病/疾病发展之前，不断和远程监视受试者的心电图信息迅速诊断和及时对AF进行预处理的潜力。最终，可以降低CVD相关的死亡率。在此手稿中，展示了体现可穿戴心电图设备，移动应用程序和后端服务器的个性化医疗系统的设计和实施。该系统不断监视用户的心电图信息，以提供个性化的健康警告/反馈。用户能够通过该系统与他们的配对健康顾问进行远程诊断，干预措施等。已经评估了实施的可穿戴ECG设备，并显示出极好的一致性（CVRMS = 5.5％），可接受的一致性（CVRMS = CVRMS = CVRMS = 12.1％），可忽略不计的RR间隙错误（<1.4％）。为了提高可穿戴设备的电池寿命，提出了使用ECG信号的准周期特征来实现压缩的有损压缩模式。与公认的架构相比，它在压缩效率和失真方面优于其他模式，并在MIT-BIH数据库中以ECG信号的某个PRD或RMSE达到了至少2倍的Cr。为了在拟议系统中实现自动化AF诊断/筛查，开发了基于重新系统的AF检测器。对于2017年Physionet CINC挑战的ECG记录，该AF探测器获得了平均测试F1 = 85.10％和最佳测试F1 = 87.31％，表现优于最先进。

translated by 谷歌翻译

DSR: Direct Simultaneous Registration for Multiple 3D Images

Zhehua Mao , Liang Zhao , Shoudong Huang , Yiting Fan , Alex Pui-Wai Lee

分类：计算机视觉 | 机器人

2021-05-21

本文提出了一种名为Direct同时注册（DSR）的新颖算法，该算法以同时方式注册3D图像的集合，而无需指定任何参考图像，特征提取和匹配，或信息丢失或重复使用。该算法通过最大化预定义的全景图像和本地图像之间的相似性来优化本地图像框架的全局姿势。尽管我们将问题作为直接捆绑调整（DBA），通过研究求解过程中全景图像的独立性，可以共同优化全景图像的姿势和全景图像的强度，但提议DSR求解DSR仅姿势并被证明能够获得与DBA相同的最佳姿势。所提出的方法特别适用于无法获得不同特征的场景，例如经食管超声心动图（TEE）图像。通过通过模拟和体内3D TEE图像将其与四种广泛使用的方法进行比较来评估DSR。结果表明，所提出的方法在准确性方面优于这四种方法，并且所需的计算资源要比最先进的累积成对估计（APE）少得多。

translated by 谷歌翻译

A Survey On Few-shot Knowledge Graph Completion with Structural and Commonsense Knowledge

Haodi Ma , Daisy Zhe Wang

分类：自然语言处理 | 人工智能 | 机器学习

2023-01-03

Knowledge graphs (KG) have served as the key component of various natural language processing applications. Commonsense knowledge graphs (CKG) are a special type of KG, where entities and relations are composed of free-form text. However, previous works in KG completion and CKG completion suffer from long-tail relations and newly-added relations which do not have many know triples for training. In light of this, few-shot KG completion (FKGC), which requires the strengths of graph representation learning and few-shot learning, has been proposed to challenge the problem of limited annotated data. In this paper, we comprehensively survey previous attempts on such tasks in the form of a series of methods and applications. Specifically, we first introduce FKGC challenges, commonly used KGs, and CKGs. Then we systematically categorize and summarize existing works in terms of the type of KGs and the methods. Finally, we present applications of FKGC models on prediction tasks in different areas and share our thoughts on future research directions of FKGC.

translated by 谷歌翻译

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

Haoyu Ma , Xiangru Lin , Yizhou Yu

分类：计算机视觉

2023-01-03

Unsupervised domain adaptation (UDA) for semantic segmentation is a promising task freeing people from heavy annotation work. However, domain discrepancies in low-level image statistics and high-level contexts compromise the segmentation performance over the target domain. A key idea to tackle this problem is to perform both image-level and feature-level adaptation jointly. Unfortunately, there is a lack of such unified approaches for UDA tasks in the existing literature. This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation. Concretely, for image-level domain shifts, we propose a global photometric alignment module and a global texture alignment module that align images in the source and target domains in terms of image-level properties. For feature-level domain shifts, we perform global manifold alignment by projecting pixel features from both domains onto the feature manifold of the source domain; and we further regularize category centers in the source domain through a category-oriented triplet loss and perform target domain consistency regularization over augmented target domain images. Experimental results demonstrate that our pipeline significantly outperforms previous methods. In the commonly tested GTA5$\rightarrow$Cityscapes task, our proposed method using Deeplab V3+ as the backbone surpasses previous SOTA by 8%, achieving 58.2% in mIoU.

translated by 谷歌翻译

KoopmanLab: A PyTorch module of Koopman neural operator family for solving partial differential equations

Wei Xiong , Muyuan Ma , Pei Sun , Yang Tian

分类：机器学习

2023-01-03

Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.

translated by 谷歌翻译

StyleTalk: One-shot Talking Head Generation with Controllable Speaking Styles

Yifeng Ma , Suzhen Wang , Zhipeng Hu , Changjie Fan , Tangjie Lv , Yu Ding , Zhidong Deng , Xin Yu

分类：计算机视觉

2023-01-03

Different people speak with diverse personalized speaking styles. Although existing one-shot talking head methods have made significant progress in lip sync, natural facial expressions, and stable head motions, they still cannot generate diverse speaking styles in the final talking head videos. To tackle this problem, we propose a one-shot style-controllable talking face generation framework. In a nutshell, we aim to attain a speaking style from an arbitrary reference speaking video and then drive the one-shot portrait to speak with the reference speaking style and another piece of audio. Specifically, we first develop a style encoder to extract dynamic facial motion patterns of a style reference video and then encode them into a style code. Afterward, we introduce a style-controllable decoder to synthesize stylized facial animations from the speech content and style code. In order to integrate the reference speaking style into generated videos, we design a style-aware adaptive transformer, which enables the encoded style code to adjust the weights of the feed-forward layers accordingly. Thanks to the style-aware adaptation mechanism, the reference speaking style can be better embedded into synthesized videos during decoding. Extensive experiments demonstrate that our method is capable of generating talking head videos with diverse speaking styles from only one portrait image and an audio clip while achieving authentic visual effects. Project Page: https://github.com/FuxiVirtualHuman/styletalk.

translated by 谷歌翻译

A New Perspective to Boost Vision Transformer for Medical Image Classification

Yuexiang Li , Yawen Huang , Nanjun He , Kai Ma , Yefeng Zheng

分类：计算机视觉 | 人工智能

2023-01-03

Transformer has achieved impressive successes for various computer vision tasks. However, most of existing studies require to pretrain the Transformer backbone on a large-scale labeled dataset (e.g., ImageNet) for achieving satisfactory performance, which is usually unavailable for medical images. Additionally, due to the gap between medical and natural images, the improvement generated by the ImageNet pretrained weights significantly degrades while transferring the weights to medical image processing tasks. In this paper, we propose Bootstrap Own Latent of Transformer (BOLT), a self-supervised learning approach specifically for medical image classification with the Transformer backbone. Our BOLT consists of two networks, namely online and target branches, for self-supervised representation learning. Concretely, the online network is trained to predict the target network representation of the same patch embedding tokens with a different perturbation. To maximally excavate the impact of Transformer from limited medical data, we propose an auxiliary difficulty ranking task. The Transformer is enforced to identify which branch (i.e., online/target) is processing the more difficult perturbed tokens. Overall, the Transformer endeavours itself to distill the transformation-invariant features from the perturbed tokens to simultaneously achieve difficulty measurement and maintain the consistency of self-supervised representations. The proposed BOLT is evaluated on three medical image processing tasks, i.e., skin lesion classification, knee fatigue fracture grading and diabetic retinopathy grading. The experimental results validate the superiority of our BOLT for medical image classification, compared to ImageNet pretrained weights and state-of-the-art self-supervised learning approaches.

translated by 谷歌翻译

P3DC-Shot: Prior-Driven Discrete Data Calibration for Nearest-Neighbor Few-Shot Classification

Shuangmei Wang , Rui Ma , Tieru Wu , Yang Cao

分类：计算机视觉

2023-01-02

Nearest-Neighbor (NN) classification has been proven as a simple and effective approach for few-shot learning. The query data can be classified efficiently by finding the nearest support class based on features extracted by pretrained deep models. However, NN-based methods are sensitive to the data distribution and may produce false prediction if the samples in the support set happen to lie around the distribution boundary of different classes. To solve this issue, we present P3DC-Shot, an improved nearest-neighbor based few-shot classification method empowered by prior-driven data calibration. Inspired by the distribution calibration technique which utilizes the distribution or statistics of the base classes to calibrate the data for few-shot tasks, we propose a novel discrete data calibration operation which is more suitable for NN-based few-shot classification. Specifically, we treat the prototypes representing each base class as priors and calibrate each support data based on its similarity to different base prototypes. Then, we perform NN classification using these discretely calibrated support data. Results from extensive experiments on various datasets show our efficient non-learning based method can outperform or at least comparable to SOTA methods which need additional learning steps.

translated by 谷歌翻译

Model-Driven Deep Learning for Non-Coherent Massive Machine-Type Communications

Zhe Ma , Wen Wu , Feifei Gao , Xuemin , Shen

分类：机器学习

2023-01-02

In this paper, we investigate the joint device activity and data detection in massive machine-type communications (mMTC) with a one-phase non-coherent scheme, where data bits are embedded in the pilot sequences and the base station simultaneously detects active devices and their embedded data bits without explicit channel estimation. Due to the correlated sparsity pattern introduced by the non-coherent transmission scheme, the traditional approximate message passing (AMP) algorithm cannot achieve satisfactory performance. Therefore, we propose a deep learning (DL) modified AMP network (DL-mAMPnet) that enhances the detection performance by effectively exploiting the pilot activity correlation. The DL-mAMPnet is constructed by unfolding the AMP algorithm into a feedforward neural network, which combines the principled mathematical model of the AMP algorithm with the powerful learning capability, thereby benefiting from the advantages of both techniques. Trainable parameters are introduced in the DL-mAMPnet to approximate the correlated sparsity pattern and the large-scale fading coefficient. Moreover, a refinement module is designed to further advance the performance by utilizing the spatial feature caused by the correlated sparsity pattern. Simulation results demonstrate that the proposed DL-mAMPnet can significantly outperform traditional algorithms in terms of the symbol error rate performance.

translated by 谷歌翻译